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Semiparametric models are often considered for analyzing longi- 
tudinal data for a good balance between flexibility and parsimony. In 
this paper, we study a class of marginal partially linear quantile mod- 
els with possibly varying coefficients. The functional coefficients are 
estimated by basis function approximations. The estimation proce- 
dure is easy to implement, and it requires no specification of the error 
distributions. The asymptotic properties of the proposed estimators 
are established for the varying coefficients as well as for the constant 
coefficients. We develop rank score tests for hypotheses on the coef- 
ficients, including the hypotheses on the constancy of a subset of the 
varying coefficients. Hypothesis testing of this type is theoretically 
challenging, as the dimensions of the parameter spaces under both 
the null and the alternative hypotheses are growing with the sample 
size. We assess the finite sample performance of the proposed method 
by Monte Carlo simulation studies, and demonstrate its value by the 
analysis of an AIDS data set, where the modeling of quantiles pro- 
vides more comprehensive information than the usual least squares 
approach. 

1. Introduction. Various nonparametric models have been developed for 
longitudinal data analysis. One popular nonparametric specification is the 
varying coefficient model, where the coefficients are smooth nonparamet- 
ric functions of some factors such as measurement time. Varying coefficient 
models were proposed by Hastie and Tibshirani [8], and later extended to 
longitudinal studies by Chiang, Rice and Wu [3], Huang, Wu and Zhou 
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[13], and Qu and Li [21], among others. Though flexible, the varying co- 
efficient models may overfit data when some covariate effects are indeed 
time-invariant. This motivates the partially linear varying coefficient model 
(PLVC) 

(1.1) y = x'a(t) + z'[3 + e, 

where a(t) comprises p unknown smooth functions, /3 is a (/-dimensional 
parameter vector, and e is the random error. 

We consider PLVC models for longitudinal data due to their good balance 
between flexibility and parsimony. The PLVC models have been studied by 
Ahmad, Leelahanon and Li [1] and Fan and Huang [4] for cross-sectional 
data, and by Sun and Wu [23] and Fan, Huang and Li [5] for longitudinal 
data. The current literature is mainly confined to estimating the conditional 
mean function of y. The focus of this paper is to estimate and conduct 
inference on the conditional quantile curves without any specification of the 
error distribution or intra-subject dependence structure. 

Suppose that we have n subjects, and the ith subject has m« > 1 repeated 
measurements over time. At a given quantile level r G (0, 1), we assume the 
following PLVC quantile regression model: 

(1.2) ytj = x' i:j a(tij,T) + z' i:j P(T) + e^r), i = 1, . . . ,n, j = 1, . . . 

where yij denotes the jth outcome of the ith subject, a(t,r) = («i(t,r), . . . , 
a p (t, t))' are unknown smooth functions of t for all t £ R, Xij = {xij i, . . . , 
x ij,pY ^ an d Zij = (zij t i, . . . , Zij t q)' S K 9 are the design vectors for the 
time- varying coefficients and constant coefficients, respectively, and ey(r) 
is the random error whose rth quantile conditional on (x,z,t) equals zero. 
We assume that the observations, and therefore ey, are dependent within 
the same subjects, but independent across subjects. The form of the error 
distribution and the intra-subject dependence structure are left unspecified. 

The quantile PLVC model is a valuable alternative to the conditional 
mean models for analyzing longitudinal data. First, fitting data at a set of 
quantiles provides a more comprehensive description of the response distri- 
bution than does the mean. In many applications, the functional impacts 
of the covariates on the response may vary at different percentiles of the 
distribution. For instance, the analysis of an AIDS data set in Section 6 
shows that the effect of the initial measurement on CD4 percentage is time- 
decaying for severely ill people (in the lower tail of the CD4 distribution), 
whereas it tends to be constant in the upper quantiles. Such important 
features would be overlooked by the regression approaches that focus only 
on the mean or the median. Second, the modeling of different conditional 
quantiles can be used to construct prediction intervals; see, for instance, [2] 
and [19] . Third, when the center of the conditional distribution of y is of in- 
terest, the median regression, a special case of quantile regression, provides 
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more robust estimators than the mean regression. In addition, quantile re- 
gression does not assume any parametric form on the error distribution, and 
thus is able to accommodate nonnormal errors, as often seen in longitudinal 
studies. 

Even though linear quantile regression has been well developed, theory 
and methodology for nonparametric and semiparametric quantile models 
are lagging. Hendricks and Koenker [11], Yu and Jones [26], among oth- 
ers, studied nonparametric quantile regression for independent observations. 
Recently, several authors studied quantile regression for varying coefficient 
models. Cai and Xu [2] considered local polynomial estimators for time se- 
ries data. Honda [12] and Kim [15] studied varying coefficient models for 
independent data using local polynomials and splines, respectively. 

The present paper is the first to develop theory and methodology for 
analyzing longitudinal data in the quantile PLVC models. Koenker [17] and 
Lipsitz et al. [18] discussed quantile regression for longitudinal data in linear 
models. Wei and He [25] studied semiparametric quantile autoregression 
models for longitudinal data, where the errors are treated as independent 
in the hypothesis testing. 

In this paper, we propose to estimate the quantile smooth coefficients us- 
ing basis function approximations. Under some regularity conditions, we ob- 
tain the optimal convergence rate of the functional coefficient curves a(t, r), 
and establish the asymptotic normality of (3{t). Even though a Wald-type 
test can be constructed using the asymptotic distribution of /3(r), its finite 
sample performance is very sensitive to the estimation of the error density 
function evaluated at the quantile of interest. To avoid this problem, we 
develop a quantile rank score test for inference on /3(r), and show that it is 
superior to the Wald-type test in finite samples. Focusing on a varying coeffi- 
cient model, Kim [15] implemented a similar inference procedure for testing 
the hypothesis that all of the varying coefficients are constant. However, 
a question of more practical use is to test whether a particular covariate or 
a subset of covariates have time- varying effects. Extensions of the rank score 
test to this type of hypothesis testing problems are theoretically challenging, 
because the dimensions of the parameter spaces under both the null and the 
alternative hypotheses are increasing with the sample size. We provide the 
asymptotic results under this practically relevant setting, making it possible 
to test the constancy of the coefficients one at a time, a necessary step in 
selecting varying-coefficient models of different complexities. 

In Section 2, we present the proposed estimation procedure and the large 
sample properties of the resulting estimators. We develop inferential pro- 
cedures for testing /3(r) in Section 3, and for testing the constancy of a 
subset of a;(t,r)'s in Section 4. We assess the finite sample performance of 
the proposed procedures with simulation studies in Section 5. The merit of 
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the proposed methods is illustrated by analyzing a CD4 depletion data in 
Section 6. Section 7 concludes the paper with some discussion. Proofs are 
deferred to the Appendix. 

2. The proposed method. 

2.1. Estimation. For the ease of presentation, we will omit r from ati(t, r), 
P(t) and Gij(r) in model (1.2) wherever clear from the context, but we should 
bear in mind that those quantities are r-specific. Without loss of generality, 
we assume that Uj G [0, 1] for all i and j throughout. 

Let 7f(t) = (.Bi(i), . . . , Bk n +h+i(t))' be a set of B-spline basis functions 
of order h + 1 with k n quasi- uniform internal knots. We approximate each 
ol\ (t) by a linear combination of normalized B-spline basis functions ol\ (t) ~ 
J2 k s=t h+1 B s {t)0 hs = n(t)'0i, where 9 l = (0 M , . . . , 9i,k n+ h+i)' is the spline co- 
efficient vector. For details on the construction of B-spline basis functions, 
the readers are referred to Schumaker [22]. With the B-spline basis, model 
(1.2) can be approximated by 

p k„+h+l q 

y*j~E E x ij,i B s(tij)0i,s + ^2zij tS (3 s + eij =n^G + z' i:j P + eij, 

1=1 s=l s=l 

where Uij = (x^itt^, . . . , x ljtP Tr' ij )' £ , G = (0 J)S ) E M Pfc ™ with p kn = p(k n + 

h + 1), and iiij = ir(tij). The quantile coefficient estimates © and can be 
obtained by minimizing 

n rrii 

(2.i) 

i=\o=i 

where p T (u) = u{t — I(u < 0)} is the quantile loss function. This estimation 
method is a one-step procedure applicable to cases where the measurements 
are either regularly or irregularly observed. 

In practice, we choose lower order splines, such as h = 2 or h = 3 corre- 
sponding to quadratic or cubic splines. For demonstration, we assume that 
the number of internal knots, k n , is the same for each varying coefficient, 
even though it is allowed to vary in real applications; see Section 5.1 for a 
model selection criterion for determining k n . 

2.2. Large sample properties. Before presenting the main asymptotic re- 
sults, we first introduce two definitions. 

Definition 1 . Define 7i r as the collection of all functions on [0, 1] whose 
mth order derivative satisfies the Holder condition of order v with r = m + v. 
That is, for any h G fi r , there exists a constant c G (0, oo) such that for each 
heH r , \hW(s)-hW(t)\ <c\s-t\ u , for any < s,t < 1. 
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Definition 2. The function g(x, t) is said to belong to the varying coef- 
ficient class of functions y if (i) g(x, t) = x'h(t) = 2f=i xihi(t); (ii) Y%=i E{xi x 
hi(t)} 2 < co, where xi and /i;(t) G W r are the Zth coordinates of x and /i(i), 
respectively, I = 1, . . . ,p. 

For convenience, we define Z{ = (zn, . . . , Zi mi )' as the nriiXq design matrix 
on the ith subject, and Z = (Z[, . . . , Z' n )'. Similarly, denote Xi = (xn, . . . , Xi mi )' , 
T i = (t il ,...,t imi y^ mi ,X = (X[,...,X' n y and r=(T{,...,T/J'. Let Z lyl 
be the Zth column of Zi, be the cumulative distribution function (CDF) 
and fij be the density function of conditional on (xij,Zij,tij). We denote 
B< = diag(/ a (0), . . . , / imi (0)) and B = diag(B 1; . . . , B n ). 

To obtain the asymptotic distribution of f3, we first need to adjust for the 
dependence of Z and (X,T), which is a common complication in semipara- 
metric models. Similar to [1], we denote 4>(xij,tij) = Sf=i x ij,ihi(tij) £ y and 
<f>(Xi,Ti) = (4>(xn,tn), . . . ^(Xim^timj)' '. Let 

n 

(2.2) ^(.,.) = ^ g M^E[{Z i)l -<l>(X i ,T i )yB i {Z itl -<f>(X i ,T^}] 

i= i 

and mi(x, t) = E(Z^i\X = x,T = t), I = 1, . . . ,q. Note that 

n 

J2 E [{ z i,i - MXuTdYBiiZu - <P{X u T t )}} 

i=X 

n 

= J2E[{Z i>l -m l (X i ,T i )}'B i {Z i>l -m l (X i ,T i )}] 
i=i 

n 

+ E[{ mi {X t ,T t ) - <f>(Xi,Td}' 'BiimiiXuTi) - <f>(X i} Ti)}]. 

i=l 

Therefore, (f>*(Xi,Ti) are the projections of m;(Xj,Tj) onto the varying co- 
efficient functional space y (under the L2-norm). In other words, </>J*(X,, T,) 
is an element that belongs to y and it is the closest function to m/(Xj,Tj) 
among all the functions in y, for any I = 1, . . . ,q. In (2.2), we consider the 
weighted projection to account for the heteroscedasticity through Bj. 
We define 

K n = Y,{Z l - <f>*{X h TC,yBi{Zi - <t>*{X h Ti)}, 

i 

A n = Y,{Z l - 0*(X i ,T i )}'i4 i (A){Z i - </>*(Xi,Ti)}, 

i 

Ai(A) = Cov(^ r (e,i), ■ • ■ ,Vv(ei mi )), 

where </>*(Xi,Ti) = (X u T 4 ), . . . , ^pQ,^)) and ^ T {u) =r- I(u < 0). The 
Ai(A) is an rrii x m; symmetric matrix with the (j 1,^2) -element r — r 2 if 
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ji = j 2 , and 5 ijlj2 - r 2 if j 1 / j 2 , where 5 ijlj2 = P(e ijl (r) < 0, e^ 2 (r) < 0) 
measures the tail dependence of a pair of residuals from the same subject, 
and A is the collection of all 5ij 1 j 2 , s. The following assumptions are needed 
to obtain the asymptotic properties of a(t) and (5. 

Al For some r > 1, ai(t) £ TC r , I = 1, . . . ,p. 

A2 The conditional distribution of T given X = x has a bounded density 
of fx\x satisfying < c\ < fT\x(A x ) — c 2 < °°j uniformly in x and t for 
some positive constants c\ and C2. 

A3 The numbers of measurements mi are uniformly bounded for all % = 
l,...,n. 

A4 Uniformly over i and j, fij(-) is bounded from infinity, and it is bounded 
away from zero and has a bounded first derivative in the neighborhood 
of zero. 

A5 For all i and j, the random design vectors X-ij ■ Zij Eire bounded in prob- 
ability. 

A6 Let N = Ya=i m * denote the total number of observations. The eigen- 
values of N^Kn and N^A n are bounded away from infinity and zero 
for sufficiently large n. 

Theorem 1. Under assumptions A1-A6, ifr>\ and k n ~ n 1 ^ 2r+1 \ 
then 

-■ n nii 

( 2 - 3 ) n £ E^C^) " = P (^" 2r/(2r+1) )> I = 1, • • • ,P, 

8=1 j = l 

and 

(2.4) A-^^^-^^JV^/g). 

Throughout, we use a n ~ 6 n to mean that a n and b n have the same order as 
n — > co. 

Assumption Al is required to achieve the optimal convergence rate of 
&i(t). Assumption A2 ensures that k n Ii'BJJ is positive definite for suffi- 
ciently large n, which is needed for proving the consistency of the estimators. 
Assumptions A3 and A4 are standard assumptions used in longitudinal stud- 
ies and quantile regression, respectively. The boundness assumption in A5 
is made for convenience. It suffices to assume that Xij and Zij have bounded 
fourth moments, but this will complicate the technical proof. Assumption 
A6 is used to represent the asymptotic covariance matrix of (3 and to ob- 
tain the optimal convergence rate of the estimators of the parametric and 
nonparametric parts. 
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3. Inference on nonvarying coefficients. In this section, we propose two 
large sample inference procedures for testing the nonvarying coefficients (3, 
including a Wald-type test and a rank-score-based test. 

3.1. Wald-type test. Based on the asymptotic normality (2.4), a Wald- 
type test can be constructed for inference on (3 through direct estimation 
of the covariance matrix, which involves the unknown error density fij(0). 
Koenker [16] discussed several ways to estimate fij(0) and provided recom- 
mendations for selecting the smoothing parameters. In this paper, we adopt 
the idea of Hendricks and Koenker [11] and estimate fij(0) by the difference 
quotient, 

fij(0) = 2e n [x' ij {a(t ij ,T + e n ) - d(%,r- e n )} 

(3-1) 

+ z' l] {0{T + e n )-l3{r-e n )}]-\ 

where e n is a bandwidth parameter tending to zero as n — > oo . Throughout 

our numerical studies, we choose e„ = 1.57n- 1 / 3 (1.50 2 {$- 1 (r)}/[2{$- 1 (r)} 2 + 
l]) 2 / 3 following Hall and Sheather [7], where <&(•) and <p(-) are the CDF and 
density function of the standard normal distribution. 
Denote 

Bi = diag{/ 4l (0), . . . , / imi (0)}, B = diag(Bi, . . . , B n ), 

(3.2) n = (nn, • • • ,ni mi , . . . ,n nmn )', 

P = U(W BU^W B , Z* = (I-P)Z, 
and let Z* be the rows of Z* corresponding to the ith subject. 

Theorem 2. Let e^ = yij - x'^ait^) - z-j/3, = (en, e imi )' , 

n n 

(3.3) k n = Y J Zr^{e i W{e l )Zl K n = £ Z? . 

i=i i=i 

If £n —> 0, liminfji^oo ne 2 > and the assumptions of Theorem 1 hold, then 
N-^Kn - K n ) = op(l) and iV- 1 (A n - A n ) = o p (l). 

3.2. Rank score test. Theorem 2 shows that the covariance matrix of (3 
can be estimated consistently. However, the finite sample performance of the 
Wald-type test is sensitive to the choice of e n . As an alternative, we consider 
a quantile rank score test, which was proposed by Gutenbrunner et al. [6] 
for linear models. 

We partition (3 into two parts f3\ G M Ql and @2 £ ^- q2 with q\ + qi = q. 
Suppose we want to test Hq : j3\ = 0. Let Z^ be the N x q\ and Z^ be the 
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N x q<i design matrices corresponding to f3\ and /?2, respectively. Further- 
more, we denote W = (II, Z^), P ro = WfW'BW)" 1 ^, V = (I - P m )Z^\ 
and (ft = (O',/^)'- In practice, the weight matrix B can be estimated by B 
as defined in (3.2), and this will not affect the asymptotic behavior of the 
rank score test statistics to be proposed in the following and in Section 4. 

Let dij G R 91 and Wij G W k n+Q2 j-, e the column components of V and W 
associated with the jth measurement of the ith subject, respectively, and 
T>i = (dn, . . . , d irn y . We define the rank score test statistic as 

(3.4) T n = S' n V- x S n , 

where 

S n = N~ 1/2 dijip T (eij), iij = yij - zu'ijct), 

n 

4>= argmmJ2pr(yij-ru' ij( ft), V n = N' 1 J2^iM^U^i 

and 

Vv(e;) = (tp T (eu), . . . ,ift T (e imi ))' . 

To establish the asymptotic distribution of T n , we modify the assumption 
Al as Al' and make an additional assumption A7, 

Al' There exists some r > 2 such that ai(t) G H r ,l = 1, . . . ,p. 
A7 The minimum eig envalue of V n = N' 1 Ya=i £>-^(A)£>i is bounded away 
from zero for sufficiently large n. 

Theorem 3. If assumptions Al' and A2-A7 hold, and n 1 /^ k n 
n 1 / 4 with a n <C b n meaning a n = o(b n ), we have T n — > x 2 (qi) as n ~> 00 ■ 

Remark 1. The finite sample efficiency of V n could be debilitated by 
imposing a fully nonparametric structure to the correlation matrix. If the 
structure of Ai (A) is known, we can estimate A empirically with A by in- 
corporating information across subjects. Plugging A to V n , we obtain an 
asymptotically equivalent estimator V2 n = N^ 1 J27=i l^f A(h)T*i. Suppose, 
for instance, that Ai(A) has a compound symmetry structure with d~ij 1 j 2 = 5. 
The parameter 5 can be estimated consistently by 5 = L^ 1 J2i J2j 1 ^j 2 < 
0,iij 2 < 0), where L denotes the total number of pairs of repeated measure- 
ments from the same subject. For this special V-2 n involves the esti- 
mation of only one nuisance parameter 5, it is expected to be more efficient 
than V n in finite samples. The same discussion applies to the estimation of 
the covariance matrix of s n in (4.4). 
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4. Constancy of varying coefficients. In semiparametric models, another 
question of practical interest is to test whether one or some of the varying 
coefficients is constant. We propose a rank-score-based procedure for testing 
such hypotheses through model re-parameterization. Without loss of gener- 
ality, consider testing the null hypothesis that the first 1 < p\ < p coefficient 
functions a;(-) are constant: 

(4.1) H :ai(t)=-fi, l = l,..., Pl , 

for all t G [0, 1] , where 7; are unknown constants, versus the alternative 
hypothesis 

(4.2) Hi : one or more functions ai(t),l = 1, . . . ,px, are time-varying. 
We can find transformation matrices G and G such that 

Gn i j = (l,n' ij )', GU l3 = (U^,uf), 

where n\f = (xg^., . . ■ , n'^)' G and u\f G W^~p\ p 1 =pi{k n + 
K). Let 

n«-m (1) n (1) Y u^-m {2) n (2) )' 

Denote £1 G W 1 and £2 G M Pkn ~ Pl as the parameters corresponding to the 
design matrices II^ 1 ) and II^ 2 ), respectively. The rth conditional quantile of 
Dij can then be approximated by 

(4.3) Q yij (r\ Xij , Zij , ttj) = ng^i + ng } '6 + 4/3 

and (4.1) and (4.2) can be represented as Hq :£i = versus H\:^\^ 0. 
The rank score test is based on the quantile estimates (p = (£2,$')' of 92 = 
obtained under H Q . Let W = {I^ 2 \Z), P w = W(W'BWy x W'B, 
D = (I — P^)!^ 1 ), and let Wij and dij be the column components of W 
and D associated with the jth measurement of the zth subject, respectively. 
Furthermore, we denote D{ = (dn, . . . , di mi )' . Note that the dimensions of 
Wij and dij are both increasing in the dimension of the B-spline space. We 
denote the rank score test statistic as 

(4.4) t n = s n v n s n , 

where s n = N~ 1 / 2 £? =1 Y™±x d %3 Mh 3 ), = Vij ~ njf £2 - Aft = Vij - 
w'ijip, v n = N~ l J2i=iD' i 'ipT(ei)'tp T (ei)'Di and ip T (ei) = (ip T (e*i ),..., i> T {h mi ))' ■ 
If k n = k is bounded corresponding to the parametric model (4.3), a rank 
score test based on t n and the x 2 {Pi{k + fy) reference distribution can be 
used to test Hq : £1 = 0; see [24]. However, to test the constancy of functional 
coefficients in the PLVC models, this rank score test will be inconsistent 
unless k n grows with the sample size. The following Theorem 4 gives the 
asymptotic null distribution of the rank score test statistic t n for growing 
k n . 
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Theorem 4. Assume that Al' and A2-A6 hold, h>3, and the number 
of knots satisfies n 1 ^ 2r+2 ^ <C /c n ^ n 1 /^ . Furthermore, assume that the min- 
imum eigenvalue of the matrix v n = N~ l Ya=i D[Ai{&)Di is bounded away 
from zero for sufficiently large n. Under Hq, we have 

tn - (k n + K)pi D 



^/2{k n + h)pi 



N(0, 1) as k n — > oo. 



Remark 2. In the special homoscedastic case, where the errors have a 
common density with fij(0) = /(0) for all i and j, the weight matrix B will 
be canceled out in the projection matrices such as P ro and P w . Therefore, 
the rank score test statistics (3.4) and (4.4) reduce to simpler forms free of /. 
However, for heteroscedastic errors, the matrix B needs to be incorporated 
appropriately in the projection matrices to ensure the orthogonality of T> 
and BW, D and BW , and the sandwich formula in the covariance matrix 
of j3. 

Kim [15] proposed a similar inference procedure to test whether all of 
the coefficients are constant. Our developed testing procedure has wider ap- 
plications, as it can examine the constancy of a subset of possibly varying 
coefficients, and thus can be used for model selection. Note that in Kim 
[15], the dimension of nuisance parameters under Hq, £2, is finite. In our 
setup, both the dimension of nuisance parameters and that of parameters 
to be tested, £i, are allowed to increase with sample size n. The increas- 
ing dimension of parameters poses challenges to establish the asymptotic 
representation of s n , which is needed to prove Theorem 4. 

5. Simulation study. In this section, we investigate the finite sample per- 
formance of the proposed estimation and inference methods with Monte 
Carlo simulation studies. We generate 2000 data sets, each consisting of n 
subjects. We consider two sample sizes n = 30 and n = 100. Each subject 
is supposed to be measured at scheduled time points {0, 1,2, ... , 10}, each 
of which (excluding time 0) has a 20% probability of being skipped. This 
results in different numbers of repeated measurements mi for each subject. 
The actual measurement times are generated by adding an U(— 0.5,0.5) ran- 
dom variable to the nonskipped scheduled times. We focus on r = 0.25 and 
r = 0.5 in this study. 

The data sets are generated from the following heteroscedastic model 

Vij = a (tij) + ai(tij)xij.i + a 2 {tij)Xijfl + at3(tij)xij 3 

(5.1) 

+ (3zi + (l + \ 

where £jj,2 and follow the standard normal, [7(^/10,2 + ty/10), 
and the standard exponential distributions, respectively, Zi is a time-invariant 
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covariate generated from a Bernoulli distribution with 0.5 probability of be- 
ing 1, ejj(r) = &ij — i ? ~ 1 (r) with F being the common CDF of e,j. Here, 
.F _1 (t) is subtracted from to make the rth quantile of ej,(r) zero for 
identifiability purpose. 

We consider three cases for generating e,j. In cases 1 and 2, e« = (e^i, . . . , ej mi 
follows a multivariate normal distribution iV(0,Sj). In case 3, ej is gener- 
ated from a multivariate i(3) distribution, specifically ej = \/3u _1//2 e, where 
e ~ iV(0, Sf), u consists of mi independent x 2 (3) random variables, and e 
and u are mutually independent. The covariance matrix Sj has an exchange- 
able correlation structure with correlation coefficient of g = 0.8 in cases 1 
and 3, while an AR(1) correlation structure with g{eij 1 ,eij 2 ) = 0.8™i~ 
in case 2. We set (3 = 1 and 

a (t) = 15 + 20sin^ 

(5.2) ai (t) = 2-3cos( (3t ~ 5 25)7r ), 

a 2 (t) = 6 - 0.6t, a 3 (t) = -4 + (20 - 3t) 3 /1000. 

5.1. Estimation. Throughout our numerical studies, we use the cubic 
splines with h = 3 in the B-spline approximation. For each data set, we 
choose k n as the minimizer to the following Schwarz-type information crite- 
rion, 

sic(fc) = log j^> r ( yii - w tJ e {k) - z[p {k) ) 

lo&N 

+ ^{p(K + k + l)+q}, 

where p = 4, 9 = 1, and are the rth quantile estimators obtained 
from minimizing (2.1) with k knots; see He, Zhu and Fung [10] for a similar 
criterion for knots selection. 

To demonstrate the flexibility and efficiency of the PLVC model, we com- 
pare its performance to the linear constant coefficient (LCC) model 

Uij = «o + aiXij t x + a 2 Xi^ 2 + 013X13,3 + @Zi + ey. 

Let /3plvc and Plcc denote the quantile estimation of (3 obtained under the 
PLVC and the LCC models, respectively. For fair comparison, we also gener- 
ate 2000 data sets from the LCC model with (ao, oti, a 2 , 03) = (15,2,6,-4) 
for each case. Table 1 summarizes the estimated mean squared error (MSE) 
and bias of /3plvc and Plcc at t = 0.5 and n = 100. When the data is gen- 
erated from a PLVC model, fitting the LCC model leads to less efficient 
estimations. The PLVC model is more flexible and it does not lose much 
efficiency even when the underlying model has constant coefficients. 



12 



H. J. WANG, Z. ZHU AND J. ZHOU 



5.2. Inference on (3. We vary (3 in the PLVC model (5.1) from to 2.5 
to assess the type I error and power of the proposed tests. We consider two 
variants of the proposed rank score test: QRS and QRS^. The QRS does 
not specify the dependence structure of errors and is based on the covari- 
ance estimator V n . The QRS^ is based on Vi n assuming an exchangeable 
intra-subject correlation. For comparison, we also include the Wald-type 
test (Wald), and the mean regression method of Huang, Wu and Zhou [13] 
based on bootstrap of 500 samples, referred to as LSE. The comparison be- 
tween LSE and the other three methods is possible only at r = 0.5 when the 
mean and median functionals are the same. 

Table 2 summarizes the type I errors of the four methods in cases 1-3. The 
Wald-type test relies heavily on the estimation of the error density function. 
For all the cases considered, Wald fails to maintain the nominal significance 
level of 0.05, especially at re = 30. The other three tests all maintain the 
level reasonably well. Figure 1 shows the power curves of QRS, QRS^ and 
LSE in cases 1 and 3. The rank score tests QRS and QRS,5 give very similar 
power at both n = 30 and re = 100 in this simulation study for inference on 
the scalar (3. Another power comparison (not reported due to space limit) 

Table 1 

The Monte Carlo mean squared error (MSE) and bias of /3plvc and /3lcc in cases 1-3 

at r = 0.5 and n = 100 

Underlying model: PLVC Underlying model: LCC 



MSE Bias MSE Bias 





/^PLVC 


/^LCC 




/^LCC 


/^PLVC 


/^LCC 


/^PLVC 


/^LCC 


Case 1 
Case 2 
Case 3 


0.111 
0.073 
0.156 


0.187 
0.160 
0.336 


0.001 
0.001 
-0.014 


-0.001 
-0.007 
-0.016 


0.111 
0.072 
0.154 


0.111 
0.073 
0.154 


0.001 
-0.002 
-0.016 


0.001 
-0.003 
-0.014 



Table 2 

Type I errors for testing Hq : (3 = 0. The nominal significance level is 0.05 



Case 


T 




n = 


30 






n = 


100 




QRS 


QRS, 


Wald 


LSE 


QRS 


QRS, 


Wald 


LSE 


1 


0.25 


0.059 


0.065 


0.139 


/ 


0.061 


0.061 


0.113 


/ 




0.5 


0.061 


0.068 


0.130 


0.069 


0.053 


0.053 


0.088 


0.053 


2 


0.25 


0.059 


0.062 


0.145 


/ 


0.055 


0.057 


0.117 


/ 




0.5 


0.057 


0.068 


0.104 


0.072 


0.059 


0.059 


0.097 


0.056 


3 


0.25 


0.064 


0.063 


0.129 


/ 


0.049 


0.049 


0.075 


/ 




0.5 


0.054 


0.064 


0.100 


0.038 


0.049 


0.050 


0.081 


0.030 
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-- QRS QRS 5 ■ • ■ LSE 



Fig. 1. Power curves of different methods for testing Ho : P = at r = 0.5 and n — 30. 



suggests that QRS^ is more powerful than QRS in small samples such as 
n = 30 for inference on (3 E M q with q > 1. Compared to LSE, the median 
approaches are much more powerful for heavy-tailed data (case 3), and they 
perform equally well in finite samples for normal data (cases 1-2), where 
LSE is known to have higher asymptotic efficiency. 

5.3. Inference on the constancy of ai(t). To test whether a\{t) devi- 
ates from a constant, we fix (3 = 1 and generate data from a sequence of 
alternative models 

a±(t, rj) = 2 — 37/ cos 

where 77 determines the extent that a\{t) varies with time. We set rj £ [0, 1.5] 
with corresponding to a model with a constant coefficient ol\ . 

Table 3 shows that QRS, QRS^ and LSE all maintain the level reasonably 
well, even at small samples with n = 30. The QRS and QRS^ give similar 
power at n = 100, but QRS^ is more powerful in small samples n = 30; see 
Figure 2 for typical examples in cases 1 and 3 at r = 0.5. As observed in 
Section 5.2, the median approaches perform competitively with LSE in cases 
1-2, and they are substantially more powerful than LSE in case 3. 

We now summarize the main findings from this simulation study. First, 
the proposed rank score tests perform well in terms of both level and power, 




14 



H. J. WANG, Z. ZHU AND J. ZHOU 



for testing (3 or the constancy of varying coefficients. Second, for small sam- 
ples, QRS^ is more powerful than QRS for testing the constancy of a\(t) and 
for testing (3 £ IR 9 with q > 1 in cases 1 and 3 when the underlying correlation 
is exchangeable. For the data sets generated in this study, QRS^ performs 
competitively even in case 2 when the correlation structure is misspecified. It 
is commonly known that analysis at different quantiles can reveal structural 
heterogeneity that might be overlooked by mean regression methods. When 
the main interest is on the center of the distribution, the proposed median 
approach does not lose much finite sample efficiency for normal data com- 
pared to the mean method, but it performs more robustly for heavy tailed 
data. 

6. Application to AIDS data. 

6.1. Background of the study. We illustrate the proposed quantile ap- 
proach by analyzing a subset from the Multi-Center AIDS Cohort study. 
The data set consists of 283 homosexual males who were infected with HIV 
between 1984 and 1991. Each patient had a different number of repeated 
measurements and different measuring times. Details of the experimental 
design can be found in Kaslow et al. [14]. Several researchers including Fan, 
Huang and Li [5], Huang, Wu and Zhou [13] and Qu and Li [21] have studied 
the same data set to analyze the mean CD4 percentage. Our analysis aims 
to model the effects of smoking status, age and pre-HIV infection CD4 per- 
centage (PreCD4) on different quantiles of the CD4 percentage population. 

Previous analyses on the same data set suggested that smoking status 
and age have no significant effects on the mean CD4 percentage, and the 
baseline has time- varying effect, but it remains unclear whether the effect of 
PreCD4 is constant or not. Therefore, at a given quantile level r, we consider 
a parsimonious PLVC model 

(6.1) yij = a (tij,r) + ai(tij,T)xi + /3i(r)z i) i + /^(r)^ + 



Table 3 

Type I errors for testing the constancy of a± (t) . The nominal significance level is 0. 05 



Case 


T 




n = 30 






n = 100 




QRS 


QRS a 


LSE 


QRS 


QRS, 


LSE 


1 


0.25 


0.037 


0.061 


/ 


0.060 


0.059 


/ 




0.5 


0.052 


0.057 


0.064 


0.049 


0.048 


0.056 


2 


0.25 


0.035 


0.053 


/ 


0.055 


0.059 


/ 




0.5 


0.057 


0.062 


0.070 


0.051 


0.051 


0.067 


3 


0.25 


0.036 


0.055 


/ 


0.047 


0.046 


/ 




0.5 


0.044 


0.056 


0.021 


0.061 


0.065 


0.022 
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0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 



ri 11 

-- QRS QRS 5 ■ • ■ LSE 

Fig. 2. Power curves of different methods for testing the constancy of at\(t) at r = 0.5 
and n — 30. 

where yij is the ith patient's CD4 cell percentage at time ty (in years), 
Xi is the centered PreCD4, z^i is the ith patient's smoking status (1 for 
smoker and for nonsmoker), and z< 2 is the ith patient's centered age 
at HIV infection. In this study, we consider a set of quantiles with r G 
{0.1, 0.2, . . . , 0.8, 0.9}. At a fixed quantile level r, the baseline function ao(*> t) 
represents the rth quantile of CD4 percentage t years after the infection for 
a nonsmoker with average PreCD4 percentage and average age at HIV in- 
fection. 

6.2. Model assessment. For comparative purpose, we also consider the 
LCC model assuming that both ao(t, r) and cti(t, r) are constant at a given 
quantile level r. To assess how well the two models fit the data, we consider 
the following model assessment tool by comparing the empirical distribution 
of Y with the simulated distribution from each model. We first generate u 
from U(0, 1). At a fixed time point t* , we randomly choose a subject from 
those who have measurements at t* or close to t* (within a distance of 0.001), 
and denote the corresponding covariates as (x* , 2^ , z|) . The simulated Y* is 
generated as the uth conditional quantile a (t*,u) + a.\{t* ,u)x* + 0i(u)z* + 
P2(u)z2, where (ao(-,u),&i(-,u), Pi(u), ^{u)) are the estimated uth quantile 
coefficients in the model under assessment. Repeating this procedure many 
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Fig. 3. Assessment of the PLVC and LCC models for fitting the CD4 data. The 
line in each plot is y — x. 



times, say 500, we can obtain a simulated sample. If the model fits data 
well, the marginal distribution of the simulated Y* should match that of the 
observed Y. Figure 3 shows the Q-Q plots of the empirical (Y) and simulated 
CD4 percentage (Y*) at t* = 0.2, 0.8, 1.2 and 3.8. The Q-Q plots suggest that 
the PLVC model fits the data universally better than the constant coefficient 
model. 



6.3. Hypothesis testing. At each quantile level r, we consider four null 
hypotheses, Hq\: the baseline curve ao(t,T) is time invariant; Hq2- the 
PreCD4 effect is time invariant; Hq^: smoking has no effect and H04: age has 
no effect. We apply the proposed quantile rank score test QRS^ by assuming 
an exchangeable correlation structure to test each of the four null hypothe- 
ses. The p-values are summarized in the first four rows of Table 4. Figure 
4(a) and (b) plot the estimated baseline and PreCD4 effects at r = 0.1,0.3 
and 0.7. 

Consistent with the mean regression results, our analysis indicates that 
neither smoking nor age has any significant effects, and the baseline curve 
is significantly time-varying at all quantile levels considered. Additionally, 
Figure 4(a) suggests that the CD4 percentage in the lower quantiles, that 
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is, for the group of baseline people with more severe illness, depletes rather 
quickly across the time that was considered. In contrast, the upper quantiles 
of the baseline CD4 percentage drop at a slower rate for the first 3 years, 
and become stable afterward. 

There is some disagreement in terms of hypothesis testing on the con- 
stancy of the PreCD4 effect in literature. Huang, Wu and Zhou [13] reported 
a p- value of 0.059 for testing Hq2, while Qu and Li [21] claimed that the ef- 
fect of PreCD4 is significantly time-varying with a p-value of 0.045. Our 
proposed quantile regression method suggests that the effect of PreCD4 is 
time-decaying with marginal significance at lower quantiles r = 0.2 and 0.3, 
and it tends to be stable for r > 0.3. Such structural heterogeneity would 
not have been revealed by ordinary regression methods focusing solely on 
the conditional mean. 

To examine the constancy of functional coefficients, we also investigate an 
alternative shrinkage approach through minimizing the penalized objective 

function: pAVij ~ ^if'Ci ~ - z'ijP) + where || • ||i denotes 

the L\ norm, and £i and £2 are defined in (4.3). The tuning parameter 
A is chosen by minimizing the Schwarz-type information criterion SIC = 
log{Ey PriVij ~ njfli - - 4-/3)} + l ^fdf, where df is the number 

of coefficients that are not shrunk to zero. If all components of £1 are shrunk 
to zero simultaneously, that is, ||£i||i = 0, the coefficient functions ai(t),l = 
1, . . . , pi, are considered as time- invariant. We apply the shrinkage approach 
to examine the constancy of the baseline and the PreCD4 effects separately. 
The resulting ||£i||i at different quantile levels are summarized in the last 
two rows of Table 4. The results from the shrinkage method agree quite well 

Table 4 

Analysis results for the CD4 data at different quantile levels. The first four rows are the 
p-values from the rank score tests. The last two rows are the resulting ||£i||i from the 
shrinkage approach for examining the constancy of the baseline and PreCD4 effects, 

respectively 



Null hypothesis 


0.1 


0.2 


0.3 


0.4 


0.5 0.7 


0.9 








The 


p-values from the rank score tests 




Hoi: 


constant baseline 


0.000 


0.000 


0.000 


0.000 


0.000 0.000 


0.000 


Hq2'- 


constant PreCD4 


0.264 


0.028 


0.010 


0.332 


0.301 0.324 


0.597 


Ho3'- 


no smoking effect 


0.085 


0.054 


0.476 


0.685 


0.807 0.581 


0.211 


Hoi: 


no age effect 


0.631 


0.525 


0.214 


0.117 


0.212 0.268 


0.418 








The 


||£i ||i from the shrinkage approach 




Hoi: 


constant baseline 


0.781 


0.571 


0.864 


0.663 


0.484 0.687 


0.623 


Ho2'- 


constant PreCD4 


0.295 


0.284 


0.153 


0.000 


0.345 0.000 


0.000 
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Fig. 4. The estimated varying coefficient curves at r = 0.1,0.3 and 0.7. 



with those from the rank score test, both suggesting that the baseline effect 
is time-varying at all quantiles considered, and the PreCD4 effect tends 
to be time-varying at r = 0.2 and 0.3. Further investigation is needed to 
understand the theoretical property of the shrinkage method and to compare 
with the proposed rank score test approach. 



7. Discussion. In this paper, we introduced a marginal quantile partially 
linear model with varying coefficients for analyzing longitudinal data. We 
proposed a simple procedure for estimating the quantile coefficients (3 and 
a(t) using B-spline approximation, and established the asymptotic proper- 
ties of the resulting estimators. We further developed rank score tests for 
some important inferential issues on both (3 and a(t). The proposed rank 
score tests are easy to implement and robust in performance. The estimat- 
ing and inference approach does not require any specification of the error 
distribution or the dependence structure. However, our empirical studies 
suggest that the finite sample performance of the tests could be improved 
by specifying an approximate correlation structure. 

We chose B-spline to approximate the smooth functional coefficients in 
this paper due to its computational efficiency and stability. The B-spline 
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is able to exhibit local features and it provides good approximations with 
a small number of knots. Other smoothing techniques, such as penalized 
regression splines and local polynomial fitting could be employed in our 
model framework, and this deserves further research and comparison. 

The proposed rank score tests can be inverted to construct confidence 
intervals for the parametric coefficient (3; see Koenker [16] for details of such 
confidence interval construction in a different context. However, we have not 
addressed the issue of confidence regions for the functional curves a\ (t) , such 
as intervals at a fixed t or simultaneous confidence bands for all t within a 
range. Resampling of subjects might provide a solution to this problem, but 
further theoretical and practical investigation is needed. 

APPENDIX 

Throughout the appendix, we use || • || to denote the Euclidean norm. 
First note that by assumption Al and Corollary 6.21 of [22], there exists a 
spline approximation ir'(t)9iQ to such that 

(A.l) sup \a l (t)-ir'(t)d l0 \=O(k- r ), 1 = 1,..., p. 

te[o,i] 

To build links between (K n ,A n ) and (K n ,A n ), we define 
P = U(IL'BU)- 1 IL'B, Z* = (I-P)Z, 

n n 

K* = Z*'BZ* = Z*'BiZ*, A* n = Z*'AZ* = ^ Z*'AiZ*. 

i=l 1=1 



Lemma 1. Under the assumptions of Theorem 1, we have 

N- 1 (K* n -K n )=o p (l) and N' 1 ^ - A n ) = o p (l). 

Proof. Let ** = {(p*{X 1 ,T i y,...,(j)*(X n ,T n yy and Z = (Z - $*) + 
<!>* = A + where (ft* is defined in (2.2). We can write 

N^K* = N^iA'BA + $*'(/ - P')B{I - P)$* 
(A.2) +A'{I - P')B{I - P)$* 

+ $*'(/ - P')B(I - P)A - A'P'BPA}. 

By (A.l) and assumption A5, there exists a matrix M such that ||$* — 
LTM|| = O p (n 1 l 2 k~ r ). In addition, as P is a projection matrix, we have 

||(I - P)$*|| = ||$* - EM || + ||IIM - P**|| < 2||** - LIM|| = O p (n 1/2 k~ r ). 
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Let m(X, T) = (m(X 1 ,T 1 )', m(X n , T n )')' , r] = Z-m(X, T) and Q = m(X, 
T) — <3>*. By assumption A5, we have 

(A.3) \\Pn'\\ 2 = O p (tr(P'P)) = O p (k n ), - P) V \\ = O^n 1 / 2 ). 

On the other hand, since </>f(-,-) is the projection of m/(-,-) onto the 
varying coefficient functional space y (i.e., C-L3^) and II £ 3^, we have 

E(WBQ = and E\\n'B(\\ 2 = E \\UiBidfj = O p {k n n), 

where IT and Q are the rows of LT and £, respectively, corresponding to the 
ith subject. Hence ||n'5C|| = O p {kU 2 n 1 / 2 ). By Lemma A. 4 of [15], we have 

IIPCII^IiniUKn'sn)- 1 !!!^!! 

= O p {n l l 2 k- l l 2 )O p {k n /n)O p {{k n n) l l 2 ) = O p (k n ), 
which together with (A.3) gives 
(A.4) \\PA\\=O p (k n ). 

Thus, all the last four terms on the right-hand side of (A. 2) are o p (l), so 
the result is proven as desired. The proof of Af~ 1 (A* — A„) = o p (l) follows 
with similar arguments. □ 

Lemma 2. Under the assumptions of Theorem 1, we have 
(A.5) A- 1 /2 Z *Vr(e)-^iV(0,/ 9 ) ! 
where Vv(e) = (Vv(eii), . . .,^ T (e nm J)'. 

Proof. Similar to (A. 2), we can write 

A; 1 /2 Z * Vr(e) = A -l/2 A>r(e) 

(A.6) 

+ K 1I2 {^\I - P')Me) ~ A'P'W(e)}. 

By (A.l) and (A.4), we have N' 1 E\\<5>*' (I - P')ip T (e)\\ 2 = o(l), and N' 1 x 
£||A'PVv(e)|| 2 = o(l). Thus, the last term on the right-hand side of (A.6) 

is o p (l), and the first term A n 1 ' 2 A'ip T (e) —> N(0,I q ) by assumption A6 and 
the central limit theorem. □ 



Proof of Theorem 1. Let 

?(/?,e) 



V kn 1/2 H n {e - 9 ) + k\l 2 H~ x ^BZ{(3 - (3 ) 
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e= s@, 9) = ($', a')', wh ere fl* = ^n'.Bn, ft, € and 9 = (#zo) G K Pfc " ■ 
We shall show that ||?|| = O p {kn 2 ). To do so, we standardize Z{j = A n ^ 2 K* -1 z*-. 

1/9 -i 

Tfij = fen i?~ Ily. Denote i? n jj = n^O — J2f=i x ij,i a i{Uj) as the bias from 
the spline approximation. Thus, 

n mi 

/Z Yl P-riVij ~ U ij Q ~ Z 'ij$) = zZ Pr( e ij ~ <&ij ~ ~ R nij)- 

i=l j=l ij 



By Lemma A. 5 of [15], || = 0{\Jk n /n). Applying similar argu- 

ments used in Theorem 3.1 of [25], for any e > 0, there exists L e such that 

Pi inf ^2 Pri^ij ~ <i[zij - <^ij - Rnij) > Y M e «j - Rnij) \>l-e. 

I llrll ^ T k 1 ' 2 ■ ■ ■ ■ ) 

v ||S || ->-^e^n IJ IJ 

Using the fact that J2ij Pr(^ij — — dftij ~~ Rnij) is minimized at <f over 

1/2 1/2 

the space R Pn , we have P(||<f|| < L e k n ) > 1 — e, and thus = O p (k n )■ 
This together with Lemma 1, and the definition of <f gives 

\\P-Po\\ = HA^iC- 1 ?!!! =O p (n- 1 /2||^ 1 ||) = O p („-V2 A .i/2). 

On the other hand, by (A.l), there exists constants C/, I = 1, . . . ,p, such 
that 

n-^IMUj)-^)} 2 



< 2N- 1 J2W(tijM ~ io)} 2 + 2Cfk 



2r 

n 



< 2N^\\^\\ 2 + 2\\k 1 J 2 H~ 1 Il'BZ(f3 - A,) || 2 + 2C?k~ 2r 
= Opin-^f) + O p (0 - Po\\ 2 ) + 0(k~ 2r ) = O p (k~ 2r ). 

The proof of (2.3) is hence complete. 

Next, we will establish the asymptotic normality of (3. Let <? \ = A n x 
Ya=i ^*Vr( e i)- Due to Lemmas 1 and 2, <f* is asymptotically normally dis- 
tributed with variance-covariance matrix I q . Therefore, to show (2.4), all we 
need is ||<f* — ?i|| = o p (l). 

By the definitions of <f* and <fi, for any L > and M > 0, we have P(s£ < 

M) -» 1 and P(£i < Lk l J 2 ) -> 1. Let 

Uij(^l, ?i ) = Pri&ij ~ ^l z ij ~ ^2^ij ~ Rnij) ~ Pr(&ij ~ ?l z ij ~ < ?2 7r «i — Rnij)- 

By Lemmas 8.1 and 8.3 of [25], and the orthogonality of Z* and BH, for 
any given 5 > 0, we have 



sup 

ll&-s?||«5 



XXMft'S) + (ft " $)%jMeij) - EUiji^d)} 
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sup 

||fi-?*||<5 



ij 

- i/2{,[K*y 2 K- 1 K^\ l -e 1 l K 1/2 K- 1 K l/2 C 1 ) 



o p (l). 



Therefore, 



sup 

Iki—u II < 5 



£E7i^i,3) + (fc-3)^ty T (e) 



1/2^ A^K^A^t + l/2? 1 * / A; 1 / 2 ^- 1 A; 1/2 ?r 



(A.7) 



sup 

iki-?ni<5 



Op(!)> 



where Z = (%). When - ^|| > <5, (ft - sj/fa - q*) > 0. Then (A.7) 
implies that 



lim inf V Prfej ~ ~ ^ij ~ Rnij 

n->oo ilki-fril^TT 



f) 



(Ai 



> ^""^ PrjCij ~ ?1 -^ij — ?2 7r «j — Rnij) f — 1- 
ij 



By the convexity of the objective function p r (-) and the definition of <fi, (A. 8) 
implies that for any <5 > 0, P(\\si — <^|| > 5) — ► 0, that is, ||<a - if* || = o p (l). 
This completes the proof of Theorem 1 . □ 

Proof of Theorem 2. The proof of Theorem 2 follows the similar 
arguments to those for Theorem 2 of [10], and thus is omitted. □ 

Proof of Theorem 3. Let S* = N' 1 / 2 E"=i{Epi <%Vv(%)}, which 
is a sum of n independent random vectors with mean zero. It's easy to see 
that 



Cov(S;) = N" 1 E Cov E rf ijVv(e 

Vj=i 



I J ' 



1=1 



i=l 



It follows from assumption A7 and the central limit theorem that 
(A.9) V -i/2 S * n ^N(0,I q ). 
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(A.10) 



o„(l). 



Similar to Theorem 2, we can show that under Hq, V n — V n = o p (l), which 
together with (A. 9), (A. 10) completes the proof of Theorem 3. □ 



Proof of Theorem 4. We first define 

n mi 



<=^~ 1/2 EE^A(%)> 
i=ii=i 

where the summands Y^idijipri^ij) ar e independent of each other and 
have zero mean. By Theorem 4.1 of Portnoy [20], 



(A.ll) 



.* _ S*nV n l S*n-{ k n + K]Pl D mn 



where v n = N~ l Y%=\ D' i A i (A)D i . Similar to Theorem 2, we can obtain that 
v n — v n = o p (l) under Hq. Due to (A.ll), it is clear that all we need to show 
is 



(A.12) 



n rrii 



i=lj=l 



:Op(l). 



Let y> = (€' 2 ,(3')', and v? = (^o^o)' such that Vij = U if'^w + w'ijPo + e»j + 
R n ij, where R n ij is the bias from the B-spline approximation satisfying 
Rnij = 0(k~ r ). Denote 

- E{^ T (yij - w' i:j ip) - ijj T {yij - w'ijipo)}]. 

In our context, the dimensions of both d%j and Wij grow with n, while the 
former is finite in [25], and the latter is finite in [15]. By Assumption A. 4 
and Lemma A. 5 of [15], 

(A.13) 



sup||dy|| = O p (\/k n ), SUpH^ijH = O p (y/k n ). 

ij ij 

Let S n be a sequence of positive numbers. It follows from (A.13), and Lemmas 
2.2 and 3.3 of He and Shao [9] that for any L > 0, 



(A.14) sup N~ 1/2 



^2uij(ip,ip ) =O p (k n (5 n logn) 1/2 ). 
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By expanding E{dijtp T (yij — w'^ip)} around ipo for each i,j, and using (A.13) 
and the fact that D'BW = 0, we have 



i.i 



sup N~ 1/2 
\\y>—<po\\<L5 n 

(A.15) 

= 0(n 1 / 2 (k- r+1 5 n + k 3 n / 2 6 2 n ))- 

Theorem 1 provides the consistency of <p for k n ~ n 1 ^ 2r+1 \ which, how- 
ever, is not the necessary condition for the consistency. In fact, for any 

k n = o(n 1 / 3 ), we have \\(p — (fo\\ = O v (n~^^ 2 k\l 2 \ Therefore, it follows from 
(A. 14) and (A.15) that 



(A.16) N~ 1/2 

To show (A. 12), it suffices to show 



ij 



o P (l). 



(A.17) iV" 1/2 E E WMMVij - <j<P°) - = OpQ). 

i=lj=l 

Note that under Hq, yij — w'^cpo = eij + R n ij- Therefore, the uniform ap- 
proximation technique used to obtain (A. 14) is not applicable here, as the 
left-hand side of (A.17) involves unknown bias with dimension of the same 
order as n. An alternative way to prove (A.17) is to show the L\ convergence 
of the left-hand side of (A.17), which, however, is difficult for k n <C n 1 ^ 2r ~ 1 \ 
including k n n 1 /( 2r+1 ); see (A. 23). To bypass the technical difficulty, we 
take an intermediate step. 

Let H be the set of knots used in the estimation. Denote k n as the di- 
mension of N. For easy demonstration, we assume that )b n «n 1 /( 2r+1 ),r>2, 
but the same proof goes through for other k n . Similar to (A.l), we obtain 
that under Hq, supj,- \R n ij\ = 0(k~ r ). By adding more knots into N, we have 
a new set of knots H with its dimension k n ~ n a , where n( r+1 ' / {( 2r + 1 M <^ 
k n <C n 1 / 5 . As H is a subsequence of N, the original B-spline space is a 
subspace of the new one, for which we can construct a set of basis func- 
tions such that the basis functions for the original space is a subset and is 
orthogonal_to the additional basis functions. Using the new set of knots, 
we define W,Wij,(p,(fo and R n ij the same way as W,Wij,tp,tpo and Rnij- 
Let P w = W{W'BW)- l W'B, D = (I - P W )U^ , and d i5 be the row compo- 
nents of D corresponding to the jth measurement of the ith subject. As the 
(h+ l)th order B-splines are (h— l)-times differentiable, similar to (A.l), 
for h > 3, we have 

(A. 18) sup 1 1 dij - dij 1 1 = O p (k~ h+2 + k- h+l k n ) . 

ij 
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Using (A. 18), and the facts that E{ip T (yij — w^ipo)} = E{^ T (Rij + e^)} 
O(Rnij) = 0(k~ r ) and E{ip T (yij - w'^q)} = 0(k~ r ), we have 



N 



-1/2 



(A.19) 



JV-l/2 



ij 



o p (l) and 

Op(l). 



Note that sup i7 - {w^ipo — w'^ipol < sup i7 - |-R n j ? | +sup i? - \R r , 



ing the similar arguments used for (A. 14), we can show that 



0(k~ r ). Apply- 



iV-i/2 



(A.20) 



o p (l). 



For each % and j, we expand the conditional mean E{dijip T (yij — wL^o)|0% 5 
WijjiOy)} around w[^q. Recall that sup^ ^ ||dy|| = O p (k n ' ), sup^- |to^v?o — 
w' i:j ipo\ = O p (k~ r ), yij - w' i;j cpo = eij + R nij with sup^ \R nij \ = O p (k~ r ). Un- 
der assumption A4, we have 

E[dij{^ T (yij - w' i:j ipo) - ^r(yij - wlj&o^Kdij^ij^ij)] 
= <l ; ,.!',,( - w'ijipo) + O p (k 1 J 2 k- 2r ) 

= dijfij(0)(w'ij(po ~ w'ijfo) + Op(kl/ 2 k~ 2r ) uniformly over i and j. 
Therefore, 



^Eldijiipriyij - w^vo) - ipriVij - w' i:j (po)}] 



(A.21) 



N-^ 2 \\E{D'BW^) - E{D'BW<p )\\ + 0{n l ' 2 k x J 2 k- 2r ) 
o(l), 



where the last step comes from the facts that D is orthogonal to both W'B 
and W'B, and that n 1 / 2 kl /2 k- 2r = o(l) for k n = nV(2r+i) with r > 2 and 
k n ^n 1 / 5 . Collecting (A.16) and (A.19)-(A.21), we get 



(A.22) N 1/2 dij{A(Vij ~ w'ij<p) ~ 4>r(yij - w'ijVo)} 



o p (l). 



26 



H. J. WANG, Z. ZHU AND J. ZHOU 



Furthermore, note that \ip T (^ij + Rnij) — Vv(eij)| < /(|e^-| < \R n ij\). By 
(A. 18), we have for k n > „(»-+i)/{(2r+i)r} _ 

iV- 1 / 2 ^ KftMl/y - w^o) - ^T(ey)}|| 
y 

(A. 23) 

< iV" 1 / 2 *^ . .|| = {k- r {k n n) l l 2 ) = o(l), 

ij 

which together with (A. 22) gives (A. 12). The proof of Theorem 4 is thus 
complete. □ 
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